Add GET /run/trace/{run_id} endpoint#272
Add GET /run/trace/{run_id} endpoint#272saathviksheerla wants to merge 4 commits intoopenml:mainfrom
Conversation
WalkthroughAdds two RFC 9457 problem-detail errors (RunNotFoundError, RunTraceNotFoundError) to src/core/errors.py. Introduces src/database/runs.py with async helpers 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #272 +/- ##
=======================================
Coverage ? 54.31%
=======================================
Files ? 37
Lines ? 1528
Branches ? 126
=======================================
Hits ? 830
Misses ? 696
Partials ? 2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
database.runs.get, since the result is only used for existence checking, consider selecting a constant (e.g.SELECT 1) and usingscalar_one_or_none()to reduce overhead and avoid returning an unused row object. - The
RunTrace.run_idfield usesField(serialization_alias="run_id")even though the attribute name matches the alias; you can likely drop the alias for simplicity unless you specifically need different input/output names. - The
selected=row.selected == "true"mapping assumes the database stores the value as the exact string "true"; consider normalizing/handling boolean or other truthy representations to make this more robust against schema or data differences.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `database.runs.get`, since the result is only used for existence checking, consider selecting a constant (e.g. `SELECT 1`) and using `scalar_one_or_none()` to reduce overhead and avoid returning an unused row object.
- The `RunTrace.run_id` field uses `Field(serialization_alias="run_id")` even though the attribute name matches the alias; you can likely drop the alias for simplicity unless you specifically need different input/output names.
- The `selected=row.selected == "true"` mapping assumes the database stores the value as the exact string "true"; consider normalizing/handling boolean or other truthy representations to make this more robust against schema or data differences.
## Individual Comments
### Comment 1
<location path="tests/routers/openml/runs_test.py" line_range="10-19" />
<code_context>
+
+
+@pytest.mark.parametrize("run_id", [34])
+async def test_get_run_trace_success(run_id: int, py_api: httpx.AsyncClient) -> None:
+ """Test that trace data is returned for a run that has trace entries."""
+ response = await py_api.get(f"/runs/trace/{run_id}")
+ assert response.status_code == HTTPStatus.OK
+ body = response.json()
+ assert body["run_id"] == run_id
+ assert isinstance(body["trace"], list)
+ assert len(body["trace"]) > 0
+ first = body["trace"][0]
+ assert "repeat" in first
+ assert "fold" in first
+ assert "iteration" in first
+ assert "setup_string" in first
+ assert "evaluation" in first
+ assert "selected" in first
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen success test by asserting types/values of key fields, not only their presence
Since the test only verifies key presence, it won’t catch type regressions from the DB layer to the API schema. Please also assert that `selected` is a boolean and that `evaluation` is either `None` or numeric, e.g.:
```python
assert isinstance(first["selected"], bool)
assert (first["evaluation"] is None) or isinstance(first["evaluation"], (int, float))
```
This helps detect accidental changes in the response shape earlier.
</issue_to_address>
### Comment 2
<location path="tests/routers/openml/runs_test.py" line_range="28-33" />
<code_context>
+
+
+@pytest.mark.parametrize("run_id", [24])
+async def test_get_run_trace_no_trace(run_id: int, py_api: httpx.AsyncClient) -> None:
+ """Test that 412 is returned for a run that exists but has no trace."""
+ response = await py_api.get(f"/runs/trace/{run_id}")
+ assert response.status_code == HTTPStatus.PRECONDITION_FAILED
+ body = response.json()
+ assert body["code"] == "572"
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Extend `no_trace` test to validate the full problem details shape, not just the code
In addition to status and error `code`, please assert key fields from the problem details payload (e.g. `type`/`title`/`status`) so the error contract is fully covered and accidental changes are caught earlier. For example:
```python
assert body["status"] == HTTPStatus.PRECONDITION_FAILED
assert body["type"].endswith("/problems/run-trace-not-found")
assert body["title"] == "Run Trace Not Found"
```
It would also be helpful to mirror this in the `run_not_found` test using the corresponding `RunNotFoundError` values.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/database/runs.py (1)
30-33: Add explicit trace ordering for deterministic responses.The trace query currently relies on DB default row order. Please sort explicitly so client output is stable and reproducible.
Proposed fix
SELECT `repeat`, `fold`, `iteration`, `setup_string`, `evaluation`, `selected` FROM `trace` WHERE `run_id` = :run_id + ORDER BY `repeat`, `fold`, `iteration`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/database/runs.py` around lines 30 - 33, The trace SELECT in src/database/runs.py relies on unspecified DB ordering; update the query that selects `repeat`, `fold`, `iteration`, `setup_string`, `evaluation`, `selected` for a given `run_id` to include an explicit ORDER BY (e.g., ORDER BY `repeat`, `fold`, `iteration`, `selected`) so results are deterministic and reproducible; locate the SQL string used to fetch traces and append the ORDER BY clause to it (keeping the WHERE `run_id` = :run_id filter intact).tests/routers/openml/runs_test.py (1)
18-25: Strengthen success-path assertions forselected.Please assert the accepted
selectedvalues, not just key presence, so contract regressions are caught.Proposed fix
first = body["trace"][0] @@ assert "selected" in first + assert first["selected"] in {"true", "false"}Based on learnings: enum-like values in this API should be exposed as literal strings (
"true"/"false"), so tests should verify that explicitly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/runs_test.py` around lines 18 - 25, The test currently only checks presence of the "selected" key on the first trace entry; update the assertion to verify the value is one of the literal strings "true" or "false" so regressions are caught—locate the variable first = body["trace"][0] in tests/routers/openml/runs_test.py and replace the key-presence assert for "selected" with an assertion that first["selected"] is in the allowed set {"true", "false"} (i.e., explicitly check the string value, not just existence).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/routers/openml/runs.py`:
- Line 40: The response mapping currently coerces the DB enum-like literal by
doing selected=row.selected == "true"; instead, preserve the original DB value
and pass row.selected through unchanged (replace the boolean coercion with the
literal), updating the mapping where "selected" is assigned (the expression
using row.selected) so the API shape remains the original string
("true"/"false").
In `@src/schemas/runs.py`:
- Line 14: The `selected` field in the runs schema is currently typed as bool
but must preserve literal string values "true"/"false"; change its type to
typing.Literal["true", "false"] (or str with a validator enforcing those exact
values) in the schema definition (the `selected` field in the model declared in
src/schemas/runs.py) and add a Pydantic validator if needed to ensure only
"true"/"false" are accepted and returned as strings so the response contract is
preserved.
---
Nitpick comments:
In `@src/database/runs.py`:
- Around line 30-33: The trace SELECT in src/database/runs.py relies on
unspecified DB ordering; update the query that selects `repeat`, `fold`,
`iteration`, `setup_string`, `evaluation`, `selected` for a given `run_id` to
include an explicit ORDER BY (e.g., ORDER BY `repeat`, `fold`, `iteration`,
`selected`) so results are deterministic and reproducible; locate the SQL string
used to fetch traces and append the ORDER BY clause to it (keeping the WHERE
`run_id` = :run_id filter intact).
In `@tests/routers/openml/runs_test.py`:
- Around line 18-25: The test currently only checks presence of the "selected"
key on the first trace entry; update the assertion to verify the value is one of
the literal strings "true" or "false" so regressions are caught—locate the
variable first = body["trace"][0] in tests/routers/openml/runs_test.py and
replace the key-presence assert for "selected" with an assertion that
first["selected"] is in the allowed set {"true", "false"} (i.e., explicitly
check the string value, not just existence).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 2084608b-d09f-4e37-af42-5c84b9ffd054
📒 Files selected for processing (6)
src/core/errors.pysrc/database/runs.pysrc/main.pysrc/routers/openml/runs.pysrc/schemas/runs.pytests/routers/openml/runs_test.py
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
database.runs.get, consider returning a simple boolean (or usingscalar_one_or_none()withSELECT 1 ... LIMIT 1) instead of aRow | None, since the query is only used as an existence check and this would make the intent and typing clearer. - The
get_tracefunction currently returns a genericSequence[Row]; if you add a typedProtocol/TypedDictor a small data class for the expected columns, you can get better type safety and make the mapping inget_run_tracemore robust to schema changes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `database.runs.get`, consider returning a simple boolean (or using `scalar_one_or_none()` with `SELECT 1 ... LIMIT 1`) instead of a `Row | None`, since the query is only used as an existence check and this would make the intent and typing clearer.
- The `get_trace` function currently returns a generic `Sequence[Row]`; if you add a typed `Protocol`/`TypedDict` or a small data class for the expected columns, you can get better type safety and make the mapping in `get_run_trace` more robust to schema changes.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/routers/openml/runs_test.py (1)
11-41: Reduce seed-data coupling in endpoint tests.These tests depend on fixed seeded IDs (Line 11, Line 28, Line 40). Consider arranging run/trace rows inside the test transaction (or via fixtures) so they stay deterministic if seed SQL changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/runs_test.py` around lines 11 - 41, The tests test_get_run_trace_success, test_get_run_trace_no_trace (and the parametrized case in test_get_run_trace_run_not_found) are coupled to seeded run IDs (34, 24, 999999); instead insert or create the required run and trace rows inside the test or via a fixture (transaction-scoped or test-scoped factory) and use the created run_id in the GET call to /runs/trace/{run_id}, or create a run without traces for the PRECONDITION_FAILED case; ensure cleanup by using the test DB transaction/fixture so tests remain deterministic if seed SQL changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/routers/openml/runs_test.py`:
- Around line 11-41: The tests test_get_run_trace_success,
test_get_run_trace_no_trace (and the parametrized case in
test_get_run_trace_run_not_found) are coupled to seeded run IDs (34, 24,
999999); instead insert or create the required run and trace rows inside the
test or via a fixture (transaction-scoped or test-scoped factory) and use the
created run_id in the GET call to /runs/trace/{run_id}, or create a run without
traces for the PRECONDITION_FAILED case; ensure cleanup by using the test DB
transaction/fixture so tests remain deterministic if seed SQL changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 74f66776-f5ed-43cc-ad45-a5f9b1a40f25
📒 Files selected for processing (4)
src/database/runs.pysrc/routers/openml/runs.pysrc/schemas/runs.pytests/routers/openml/runs_test.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/database/runs.py
- src/schemas/runs.py
PGijsbers
left a comment
There was a problem hiding this comment.
Thanks for taking the time to contribute! The changes look good though I did leave a remark to update one function. In addition, I want tests added that compare the output of the new Python-based implementation to that of the old PHP responses.
Please add a test module under tests/routers/openml/migration that compares the responses. It should make sure that we have known "translations" for the runs on the test server, e.g., comparison the responses there can be differences (e.g., error handling is different), but the test clearly outlines those differences and forces equality otherwise. See for example the tasks_migration_test.py file.
src/database/runs.py
Outdated
| async def get(id_: int, expdb: AsyncConnection) -> Row | None: | ||
| """Check if a run exists by ID.""" | ||
| row = await expdb.execute( | ||
| text( | ||
| """ | ||
| SELECT 1 | ||
| FROM `run` | ||
| WHERE `rid` = :run_id | ||
| """, | ||
| ), | ||
| parameters={"run_id": id_}, | ||
| ) | ||
| return row.one_or_none() | ||
|
|
There was a problem hiding this comment.
| async def get(id_: int, expdb: AsyncConnection) -> Row | None: | |
| """Check if a run exists by ID.""" | |
| row = await expdb.execute( | |
| text( | |
| """ | |
| SELECT 1 | |
| FROM `run` | |
| WHERE `rid` = :run_id | |
| """, | |
| ), | |
| parameters={"run_id": id_}, | |
| ) | |
| return row.one_or_none() | |
| async def exist(id_: int, expdb: AsyncConnection) -> bool: | |
| """Check if a run exists by ID.""" | |
| row = await expdb.execute( | |
| text( | |
| """ | |
| SELECT 1 | |
| FROM `run` | |
| WHERE `rid` = :run_id | |
| """, | |
| ), | |
| parameters={"run_id": id_}, | |
| ) | |
| return bool(row.one_or_none()) | |
I saw that sourcery suggested to use SELECT 1. That is fine, but at this point the function doesn't do that the function name says it does. Update callsites accordingly.
There was a problem hiding this comment.
Thanks for the review! Will rename get to exist returning bool and update the callsite. Will also add a migration test under tests/routers/openml/migration/ comparing PHP and Python responses for known runs.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/routers/openml/migration/runs_migration_test.py (2)
38-42:ignore_order=Truemay mask ordering regressions.If trace row order is contractually meaningful, this can hide real mismatches (Line 41). Consider sorting both payloads by
(repeat, fold, iteration)first, then compare with order-sensitive diff.Proposed adjustment
- differences = deepdiff.diff.DeepDiff( - new_json, - php_response.json(), - ignore_order=True, - ) + def _sorted_trace(payload: dict) -> dict: + copied = payload.copy() + copied["trace"] = copied["trace"].copy() + copied["trace"]["trace_iteration"] = sorted( + copied["trace"]["trace_iteration"], + key=lambda row: (int(row["repeat"]), int(row["fold"]), int(row["iteration"])), + ) + return copied + + differences = deepdiff.diff.DeepDiff( + _sorted_trace(new_json), + _sorted_trace(php_response.json()), + ignore_order=False, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/migration/runs_migration_test.py` around lines 38 - 42, The current DeepDiff call uses ignore_order=True which can hide ordering regressions; instead, sort both payload lists by the tuple keys (repeat, fold, iteration) before diffing: extract the trace rows from new_json and php_response.json(), sort each list using a key function that returns (row['repeat'], row['fold'], row['iteration']) (or safe-access those fields), then call deepdiff.DeepDiff on the sorted lists without ignore_order=True (update the DeepDiff invocation where differences = deepdiff.diff.DeepDiff(...), and reference new_json, php_response.json(), and the differences variable).
13-13: Consider naming the staged run-id intent explicitly.
34is a magic value right now (Line 13). Adding anids=label or a named constant would make fixture/staging intent clearer during failures.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/migration/runs_migration_test.py` at line 13, The parametrize uses a magic literal for run_id (pytest.mark.parametrize("run_id", [34])); replace it with an explicit intent label by either defining a named constant (e.g., STAGED_RUN_ID = 34 and parametrize with [STAGED_RUN_ID]) or adding an ids parameter to the parametrize call (e.g., ids=["staged-run-34"]) so failures clearly show the staged run intent; update references to run_id in the test to use the chosen constant or keep the existing variable name if using ids.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/routers/openml/migration/runs_migration_test.py`:
- Around line 38-42: The current DeepDiff call uses ignore_order=True which can
hide ordering regressions; instead, sort both payload lists by the tuple keys
(repeat, fold, iteration) before diffing: extract the trace rows from new_json
and php_response.json(), sort each list using a key function that returns
(row['repeat'], row['fold'], row['iteration']) (or safe-access those fields),
then call deepdiff.DeepDiff on the sorted lists without ignore_order=True
(update the DeepDiff invocation where differences = deepdiff.diff.DeepDiff(...),
and reference new_json, php_response.json(), and the differences variable).
- Line 13: The parametrize uses a magic literal for run_id
(pytest.mark.parametrize("run_id", [34])); replace it with an explicit intent
label by either defining a named constant (e.g., STAGED_RUN_ID = 34 and
parametrize with [STAGED_RUN_ID]) or adding an ids parameter to the parametrize
call (e.g., ids=["staged-run-34"]) so failures clearly show the staged run
intent; update references to run_id in the test to use the chosen constant or
keep the existing variable name if using ids.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f0eab104-d74c-4a58-845b-cc4f1a06ad48
📒 Files selected for processing (3)
src/database/runs.pysrc/routers/openml/runs.pytests/routers/openml/migration/runs_migration_test.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/routers/openml/runs.py
- src/database/runs.py
Description
Implements
GET /run/trace/{run_id}as part of the run endpoints.Fixes: #40
Related: #36
Matches PHP API behavior for error codes (571/572) and response shape.
Returns 412 with code 571 if run does not exist, 412 with code 572 if run
exists but has no trace, and trace rows on success.
Checklist
Please check all that apply. You can mark items as N/A if they don't apply to your change.
Always:
Required for code changes:
If applicable:
/docs)Extra context: